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Abstract 

For a system to understand natural language, it needs to be 
able to take natural language text and answer questions given 
in natural language with respect to that text; it also needs to 
be able to follow instructions given in natural language. To 
achieve this, a system must be able to process natural lan- 
guage and be able to capture the knowledge within that text. 
Thus it needs to be able to translate natural language text 
into a formal language. We discuss our approach to do this, 
where the translation is achieved by composing the mean- 
ing of words in a sentence. Our initial approach uses an in- 
verse lambda method that we developed (and other methods) 
to learn meaning of words from meaning of sentences and an 
initial lexicon. We then present an improved method where 
the initial lexicon is also learned by analyzing the training 
sentence and meaning pairs. We evaluate our methods and 
compare them with other existing methods on a corpora of 
database querying and robot command and control. 

Introduction and Motivation 

We consider natural language understanding as an important 
aspect of human level intelligence. But what do we mean by 
"language understanding". In our view a system that under- 
stands language can among other attributes (i) take natural 
language text and then answer questions given in natural lan- 
guage with respect to that text and (ii) take natural language 
instructions and execute those instructions as a human would 
do. 

A system that can do the above must have several func- 
tional capabilities, such as: (a) It must be able to process lan- 
guage; (b) It must be able to capture knowledge expressed 
in the text; (c) It must be able to reason, plan and in gen- 
eral do problem solving and for that it may need to do effi- 
cient searching of solutions; (d) It must be able to do high 
level execution and control as per given directives and (e) 
To scale, it must be able to learn new language aspects (for 
e.g., new words). These functional capabilities are often 
compartmentalized to different AI research topics. How- 
ever, good progress in each of these areas (over the last few 
decades) provides an opportunity to use results and systems 
from them and build up on that to develop a natural language 
understanding system. 
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Over the last two decades our group has been focusing 
in the research of developing suitable knowledge represen- 
tation languages. The research by a broader community has 
led to KR languages and systems that allow us to represent 
various kinds of knowledge and the KR systems allow us to 
reason, plan and do declarative problem solving using them. 
Various search techniques are embedded in some of these 
systems and one such system from Potsdam (CLASPQhas 
been doing very well in SAT competition^ Similarly, var- 
ious languages and systems have been developed that can 
take directives in a formal language and use it in high level 
execution and control. These cover the aspects (c) and (d) 
mentioned above. 

In our current research we use the existing results on (c) 
and (d) and develop an overall architecture that addresses 
the aspects (a), (b) and (e) to lead to a natural language un- 
derstanding framework. 

The first key aspect of our approach and our language un- 
derstanding framework is to translate natural language to 
appropriate formal languages. Once that is achieved we 
achieve (b) and then together with the (c) and (d) compo- 
nents we achieve (a). The second key aspect of our ap- 
proach and our language understanding framework is that 
we can reason and learn about how to translate new words 
and phrases. This allows our overall system to scale up to 
larger vocabularies and thus we achieve (e). 

In this paper we first give a brief presentation of our sys- 
tem and framework which was reported in an earlier limited 
audience conference/workshop. We then present some orig- 
inal work to enhance what was done then. 

Translating English to Formal languages 

Our approach to translate English to formal languages is in- 
spired by Montague's path-breaking thesis (Montague I974[ ) 
of viewing English as a formal language. We consider each 
word to be characterized by one or more A-calculus for- 
mulas and the translation to be obtained by composing ap- 
propriate A-calculus formulas of the words as dictated by a 
PCCG (Probabilistic Combinatorial Categorial Grammars). 
The big challenge in this approach is to be able to come up 
with the light A-calculus formulas for various words. Our 

'http://www.cs.uni-potsdam.de/clasp/ 
^http://www.satcompeti tion.org/ 



approach, initially presented in ( Baral et al. 201 l| l, utilizes 
inverse A-calculus operators and generalization to obtain se- 
mantic representations of words and learning techniques to 
distinguish in between them. The system architecture of our 
approach is given in figure[T] The left block shows an overall 
system to translate a sentence into a target formal language 
using the PCCG grammar and the lexicon, while the right 
block shows the learning module to learn the meaning of 
new words (via Inverse A and generalization methods) and 
assigning weights to multiple meaning of words. We now 
elaborate on some important parts of the system. 

Inverse A computation 

The composition semantics of A-calculus basically com- 
putes the meaning of a phrase "ab"by a{(3) or f3{a) depend- 
ing on the CCG parse. Now suppose we know the meaning 
"a b" to be 7 and also know the meaning of "a" as a. By 
inverse A, we refer to the obtaining of (3 given a and 7. De- 
pending on whether 7 is a{/3) or P{a) we have two inverse 
operators: Inversen and InverscL- We now give a quick 
glimpse of Inverse ^ as given in (Baral et al. 201 1^ . Further 
details are given in ( Gonzalez 2010| l. 

• Let G, H represent typed A-calculus formulas, 
J^,J^,...,J" represent typed terms, vi to ?;„, v and 
w represent variables and (Ti,...,o'„ represent typed 
atomic terms. 

• Let /() represent a typed atomic formula. Atomic formu- 
las may have a different arity than the one specified and 
still satisfy the conditions of the algorithm if they contain 
the necessary typed atomic terms. 

• Typed terms that are sub terms of a typed term J are de- 
noted as Ji. 

• If the formulas we are processing within the algorithm 
do not satisfy any of the if conditions then the algorithm 
returns null. 

Definition 1 Consider two lists of typed X-elements A and 
B, (a^, a„) and {bj, 6„) respectively and a formula H. 
The result of the operation H{A : B) is obtained by replac- 
ing ai by hi, for each appearance of A in H. 

Definition 2 The function Inverseii{H, G), is defined as: 
Given G and H: 

1. IfG is Xv.v@J, set F = InverseL{H, J) 

2. If J is a sub term of H and G is Xv.H{J : v) then F = J 

3. G is not Xv.v@J, J is a sub term of H and G is 
Xw.H{J{Ji, Jm) ■■ w@Jp, @Jq) with 1 < p,q,s < 
m then F = Xvi, Vs.J{Ji, Jm ■ Vp, ...,Vq). 

To illustrate Inverse assume that in the example given 
in table |2] the semantics of the word "in" is not known. 
We can use the Inverse operators to obtain it as follows. 
Using the semantic representation of the whole sentence, 
answer(river{loc2{stateid(f arkansas')))), and the 
semantics of the word "Name", Xx.answer{x), we can 
use the respective operators to obtain the semantics of "the 
rivers in Arkansas" as river {loc2{stateid{' arkansas'))) . 
Repeating this process recursively we obtain 



Xy.y@loc2{stateid{' arkansas')) as the representation 
of "in Arkansas" and Xx.Xy.y@loc2{x) as the desired 
meaning of "in". 

Generalization and trivial solution 

Using INVERSE and INVERSEJi, we are able to 
obtain new semantic representations of particular words in 
the training sentences. To go beyond that, we use a no- 
tion of generalization that we developed. For example, con- 
sider t he non-transitive verb "fly" who category as per a 
CCCfl (jSteedman 2000[) is S\NP. Lets assume we ob- 



tain a new semantic expression for "fly" as Xx.fly{x) using 
INVERSE.L and INVERSE.R. GeneraUzation looks 
up all the words of the same syntactic category, S\NP. It 
then identifies the part of the semantic expression in which 
"fly" is involved. In our particular case, it's the subexpres- 
sion fly. We can then assign the expression Xx.w{x) to the 
words w of the same category. For example, for the verb 
"swim", we could add Xx.swim{x) to the dictionary. This 
process can be performed "en masse", by going through the 
dictionary and expanding the entries of as many words as 
possible or "on demand", by looking up the words of the 
same categories when a semantic representation of a word in 
a sentence is required. Even with generalization, we might 
still be missing large amounts of semantics information to be 
able to use INVERSEl and INVERSEr. To make up 
for this, we allow trivial solutions, where words or phrases 
are assigned the meaning Xx.x, Xx.Xy.{y@x) or similarly 
simple representations, which basically mean that this word 
may be ignored. The trivial solutions are used as a last resort 
approach if neither inverse nor generalization are sufficient. 

Translation and the Overall Learning Algorithm 

Earlier we mentioned that a sentence is translated to a rep- 
resentation in a formal language by composing the meaning 
of the words in that sentences as dictated by a CCG. How- 
ever, in presence of multiple meaning of words probabilistic 
CCG is used where the probabilities of a particular transla- 
tion is computed using weights associated with each word 
and meaning pair. For a given sentence the translation that 
has the higher probability is picked. This raises the question 
of how does one obtain the weights. The weights are ob- 
tained using standard parameter estimation approaches with 
the goal that the weights should be such that they maximize 
the overall probability of translating each of the sentences 



^^A very brief review of the A representation is as follows. The 
formula \x.answer(x) basically means that x is an input and 
when that input is given then it replaces x in the rest of the for- 
mula. This application of a given input in expressed via the symbol 
@. Thus Xx. answer {x)@a reduces to answer{a). 

''in a combinatorial categorial grammar (CCG) words are asso- 
ciated with categories. The meaning of the category S\NP is that 
if a word of category NP comes from the left then by combining 
it with a word of category S\NP we get a phrase of category S. 
For example, if the word "a" has a category S\NP and the word 
"b" has category NP then the two words can be combined to the 
phrase "b a" which will have the category S. Similarly, the cate- 
gory S/NP means that a word of category NP has to come from 
the right for us to be able to combine. 



Sentence 



CCG 
Pa rser 



Lexicon 



3E 



PCCG Computation 



Translation 



Trainingcorpus 



Initial lexicon 



Inverse A 
Generalization 



Parameter 
Estimation 



Lexicon 



Final lexicon 



Figure 1 : Overall system architecture 



in the training set (of sentences and their desired meaning) 
to their desired meaning. We now present our overall learn- 
ing algorithm that combines inverse A, generalization and 
parameter estimation. 

• Input: A set of training sentences with tlieir corresponding desired representations 5 — 
{(Si , Li ) : i — 1. . .n} where are sentences and are desired expressions. Weights 
are given an initial value of 0. 1. An initial feature vector ©q ■ 

• Output: An updated lexicon Lrp_^i. An updated feature vector ©t^^i . 

• Algorithm: 

- SelLo = INITIAL.DICTIONARY(S) 

- For t = 1 . . . T 

- Step 1 : (Lexical generation) 

- For i = l...n. 

* Forj = l...n. 

* Parse sentence Sj to obtain Tj 

* Traverse Tj 

■ apply INVERSE.L, INVERSE.R and GENERALIZE^ to find new 
A-calculus expressions of words and phrases a. 

* Set Lt+i = Lt U a 

- Step 2: (Parameter Estimation) 

- Set Bt+i = !7PDATE{©t, 

• KtumGENERALIZE(LT,LT),e(T) 

Automatic generation of initial dictionary 

In tables 6 and 7 we compare the performance of our sys- 
tems INVERSE, INVERSE-H, and INVERSEH-(i) with other 
systems that have similar goals. However, although other 
systems had other issues, we were not happy that our sys- 
tems required a manually created initial dictionary consist- 
ing of A-calculus representations of a set of words. In the 
rest of the paper we present an approach to overcome that by 
automatically coming up with candidates for the initial dic- 
tionary and letting the parameter estimation module figure 
out the correct meaning. In particular we present methods 
to automatically come up with possible A-calculus represen- 
tation of nouns and various other words that are part of the 
initial vocabulary in ( Baral et al. 201 1 1. Unlike (| Baral et| 



al. 201 1 1, where each of the word in the initial vocabulary is 
given a unique A-calculus representation, our approach does 
not necessarily come up with a single A-calculus representa- 
tion of the words that are in the initial vocabulary in (Baral 



^For details on Q computation, please see i 
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ColUns 2005 i 



et al. 201 1 1 but sometimes may come up with multiple pos- 
sibilities. 

We will now illustrate our approach in obtaining the initial 
dictionary and the use of CCG and A-calculus in obtaining 
semantic representations of sentences on the Geoquery cor- 
pus at http://www.cs.utexas.edu/users/ml/geo.html Table [T| 
shows several examples of sentences with their desired rep- 
resentations while table |2] shows a sample CCG parse with 
it's corresponding semantic derivation. 

To be able to automatically create the entries in the ini- 
tial dictionary as given by ( Baral et al. 2011) , we need to 
answer the following two questions. How do we find the ex- 
pression Xx.answer{x) and how do we assign it to the word 
"Name"?. The word "answer" isn't given anywhere by the 
sentence. Similarly, How do we know that the semantic ex- 
pression for "Arkansas" should be stateid{'arkansas')?. 
The first question can be answered by looking at several 
possible semantic representations as given in table [T] They 
share one common aspect, which is that they all contain 
the predicate answer as the outermost expression. Thus, 
we can assume that \x.answer{x) should be part of any 
derivation as given by table|2] In general, using the grammar 
derivations for the meaning representations, we can compare 
various representations and look for common parts, which 
we will refer to as common structures. We identify these 
common parts and assign them to certain relevant words 
in the sentence, such as assigning the common expression 
\x.answer{x) to the word "Name". To answer the sec- 
ond question, we again look at the grammar derivations for 
nouns, and analyze them to be able to obtain the semantic 
expression for "Arkansas" as stateid{' arkansas') . 

Table |2] shows an example syntactic and semantic deriva- 
tion for the sentence "Name the rivers in Arkansas.". The 
syntactic categories for each are given by the upper part 
of the table. These are then combined using combi- 
natorial rules ( |Steedman 2000[ ) to obtain the rest of the 
syntactic categories. For example, the word "Arkansas" 
of category N is combined with the word "in" of cate- 
gory {NP\N)/N, to obtain the syntactic category of "in 
Arkansas", NP\N. The lower portion of the table lists the 



Sentence 


Representation 


Name the rivers in Arkansas. 


answer {river{loc2 {stateid{' arkansas' )))) 


How many people are there in New York? 


answ er (populationi (stateid(' newyork' ))) 


How high is Mount McKinley? 


answer (elevationi {placeid{' mountmckinley' ))) 


Name all the lakes of US. 


answer (lake{loc2 {countryid(' usa' )))) 


Name the states whieh have no surrounding states. 


answer (exclude(state{all) , nextt02 {state(all)))) 



Table 1: Example translations. 



Name 



the 



S/NP NP/NP 

S/NP NP/NP 

S/NP NP/NP 
S/NP NP 



{NP\N)/N 



NP\N 
WP 



Name 
Xx .answer (x) 
Xx .answer (x) 
Xx .answer (x) 
Xx .answer (x) 



the 
Xx.x 
Xx.x 
Xx.x 



answer {river{loc2 {stateid{ arkansas ) ) ) ) 



river(loc2 [stateid{' arkansas' ))) 



rivers m 
Xx.river{x) Xx . Xy .y@loc2 {x) 

Xx .river (x) Xy .y@loc2 {stateid{' arkansas' )) 
river(loc2{stateid{ arkansas ))) 



Arkansas. 

state.id{ arkansas ) 



Table 2: CCG and A-calculus derivation for "Name the rivers in Arkansas." 



semantic representations of each words using A-calculus. 
These are combined by applying the formulas one to an- 
other, following the syntactic parse tree. For example, 
the semantics of "Arkansas", stateidi^ arkansas'), is ap- 
plied onto the semantics of "in", \x.\y.y&0C2{x), yielding 
Xy .y&0C2{stateid{' arkansas')) . 

Let us first discuss the common structures of a logical 
form. For example, for the Geoquery corpus, as shown in 
table [T] many queries are of the form answer{X) where X 
is a structure corresponding to the actual query. Similarly, by 
analyzing the Robocup corpus, we realize that all the queries 
are of the form {{A) [doB]) , [de finer C (B)) or {definec 
C (B)), where C is an identifier and A and B are some other 
constructs in the given language. The main attribute of these 
expressions is that they define the structure(s) of the desired 
meaning representation. 

The second component of the dictionaries were the se- 
mantic representations of nouns. Unlike the common struc- 
tures, these need to be generated for as many nouns as possi- 
ble to ensure that the system is capable to learn the missing 
semantic representations. For example, in GeoQuery, a noun 
"Arkansas" is represented as stateid{' arkansas') . |^ For 
Robocup, a compound noun "player 5" can be represented 
as {player our {5}). 

Thus our task in being able to automatically obtain these 
is two fold. We first need to identify the common structures 
and find the appropriate A-calculus formulas and, pick the 
words to which we will assign them. The second part of our 
goal is to find the corresponding A-calculus expressions for 
nouns and compound nouns. 

We will assume this process is done on the training data 
and full syntactic parse of the sentences, as well as the parse 
of the desired formal representation are given. 

Common structures In order to look for the common 
structures, we will compare the derivation structures of var- 
ious formulas and look for common structures in them. To 

*We are using the funql representation, although the same ap- 
proach is applicable for the prolog one. 



limit the potential search, and with respect to our previous 
experience, we will only look for the common parts at top 
parts of the derivation. Also, in order to be more precise and 
keep the computation within reasonable bounds, instead of 
looking at the whole grammar for meaning representations, 
we will look at the derivations of the meaning representa- 
tions of the training data. This is a reasonable assumption, 
as in general the amount of structures in the target language 
can be assumed to be less than the amount of training data 
as in the case of Geoquery and CLANG. 

Definition 3 Given a context free grammar G with an initial 
symbol S, a set of non-terminals N, a set of terminals T, a 
set of production rules P and a string w = Xi, x„, where 
XiS are terminal or non-terminal symbols, a production d is 
a transformation xi,...,Xn =^ Xi, ...,Xi-i,A,XiJt.i, ...,Xn 
such that Xi ^ A is in P. We will say that Xi ^ A corre- 
sponds to d. 

Given a sequence of productions = di, ...,dn a deriva- 
tion tree t corresponding to d* is given as: 

- If n — 1, let X ^ Xi,X2, ■■■,Xn be the rule corre- 
sponding to di. Then t is a tree with X as the root node, 
which has n children, in order, left to right, Xi, X2, Xn. 

-Iff is a derivation tree corresponding to di, ...,dn-i 
and X — > Xi^X2, ■■■,Xn is the rule corresponding to dn, 
then t is given as t' with n children added, in order, left to 
right, Xi, X2, Xn, to the left most leaf X oft'. 

A A tree is a pair {V, t), where V is a list of A bound 
variables and t is a tree, where each interior node of t is 
a non terminal symbol from N and each leaf node of t is a 
terminal symbol from T or a variable from V. 

Given two sequences of productions di and d2 with their 
corresponding derivation trees ti and t2, a A tree (V, tc) is a 
common template of ti and t2 iff there exists two sequences 
of applications Si = Xi,...,Xn and S2 — Yi,...,Yn such 
that when we apply each Xi to each Vi, i — 1, n, intc we 
obtain a subtree ofti and when we apply each Yi to each Vi, 
i — 1, n, in tc we obtain a subtree oft2- 



answer( RIVER ) 

river( RIVER ) 

ioc2( STATE ) 



answer( PLACE ) 

;afce( PLACE ) 

^ 4- N 

ioc2{ COUNTRY ) 



CITY 

citi/{ C/TV ) 

^ 4. N 

loc2( STATE ) 

^ 4, N 

stateid( ST ATE NAME ) 

4- 



Table 3: Sample derivation trees 



Example derivation trees and a common template are 
given in tables |3]and|4j 

Xv. i/ i \ 

answer{ v ) 

Table 4: Sample common template. 

Thus, based on the above definitions, to look for 
the common structures in the desired meaning repre- 
sentations, we will look for common trees between 
derivations which are rooted at the initial symbol. 
As an example, consider the following parts of the 
derivation, obtained directly from the Geoquery cor- 
pus, for answer{river{loc2{stateid{'arkansas')))) and 
answer (lake{loc2{countryid('usa')))). 

m S — *-(ia) answer{RIVER) 

• — >^2a) t^nswer {river {RIV E R)) 

• — ^(3a) t^nswer (river {loC2{ST AT E))) 

• S — ^(ib) answer(PLACE) 

• — >(26) o,nswer{lake{P LAC E)) 

• ~^(3b) an3wer(lake(loC2(COUNTRY))) 

Starting from the initial non-terminal S, we can see that 
the rules (la) and (lb) are already different. They share a 
common part in having the terminal symbols answer { and 
). Thus, if we replace all the non-terminals in the common 
parts of the derivation with A bound variables, we obtain the 
common part of the derivations as \v.answer{v), where v 
is the new A bound variable. 

In general, having a derivation, we start at the initial sym- 
bol and follow the derivation tree level by level while com- 
paring the nodes in the derivation tree. We then collect all 
the common terminals from this subtree, and replace all the 
different non-terminals with A bound variables. Note that 
there might be multiple such structures, as in the case of 
Robocup corpus. In that case we would store and use all of 
them and the learning part of the system would take care of 
picking the proper ones. 

After finding the common structures between the deriva- 
tions, we need to find the words to which we assign them 
to. Since the structures are supposed to define the common 
structures of the desired representations, it is reasonable to 
try to assign them to words which, in a sense, "define" the 



sentences. In our case, we look for words that are usually 
last to combine in the CCG derivation. The reasoning is that 
when looking for the common structures, we looked at the 
top parts of the derivation of meaning representations. Thus 
it is reasonable to try to assign them to words which are in 
the top parts of the derivation in the syntactic parse of the 
sentence. Note that these words might not be the ones with 
most complex categories. In practice, such words are usu- 
ally verbs, wh-words or some adverbs. 

Definition 4 Given a CCG parse tree T of a sentence s and 
a word w from s, a word w is a top word if there is no other 
word w' from s, such that level{w') < level{w). 

Given a set of training pairs {Si, Li), i = 1, ...,k, where 
Si is a sentence and Li is the corresponding desired logical 
form, together with a syntactic parse of 5*^ and the derivation 
of Li, we can obtain the candidate common structures using 
the following algorithm, denoted as INITIALc- 

• Input: 

A set of training sentences with tlieir corresponding desireti representations S = {(Si , Li) : 
i — 1. ..n} where Si are sentences and Li are desired expressions. A CCG grammai- G for 
sentences Si . A CFG grammar G' for representations Li. 

• Output: 

An initial lexicon Lq. 

• Algorithm: 

- Step 1: (Word selection) 

- For i = I...n. 

* Parse Si using the CCG grammar G to obtain parse tree t j . Find all the top words of £ j 
and store them in Wi . 

- Step 2: (A-expression generation) 

- For i = l...n. 

* Forj = l...n. 

* Parse derivations Li and Lj using the CFG grammar G' to obtain the derivation trees 
Ti and Tj . 

* Starting from roots, compare Ti and Tj and find the largest common template (V, T), 
such that T that is rooted at the initial symbol of the grammar, S. 

* Concatenate all the leafs of T together to form a A-expression 'y. For each v ^ V , add 
Xv . in front of 7. 

* Add 'y as semantic expression to each of the words in Wi and Wjpj 

- Set Lq = Ui Wi 

- return Lq 

Nouns In order to derive potential A-expression candi- 
dates for nouns, instead of looking at the top of the deriva- 
tion trees and finding words, we match the nouns with the 
terminals in the leafs of the derivation tree and then look 

'This step exhaustively assigns the new semantics to all the top 
words. While not optimal, the learning part of the overall algorithm 
takes care of figuring out the proper assignment. 



for non-terminals which can produce it. As we traverse up- 
wards towards the root, we look for other terminals which 
are produced by the non-terminals we encounter At each 
encountered non-terminal, we generate potential candidate 
A-expressions by analyzing the current subtree and store 
them. As in the previous case, we leave it to the parame- 
ter learning part of the overall algorithm to figure out the 
proper ones. Our approach can be illustrated as follows. 

Let us look at an example of rules deriving 
{city{loc2{stateid{'virginia')))) from the sentence 
"Give me the cities in Virginia.", also given by table |3] 

• CITY ^if city(CITY) 

• CITY ->2/ loc2(STATE) 

• STATE ->3j- stateid(STATENAME) 

• STATENAME ->4j- ' Virginia' 

Let us assume that the noun we are interested in is "Vir- 
ginia". First, we will attempt to match it to a terminal in 
the derivation, which in this case is 'virginia'. We will 
then traverse the tree upwards. In this case, we first reach 
the non-terminal STATENAME. Since 'virginia' is the 
only child, we add 'virginia' as the potential candidate rep- 
resentation of "Virginia". Continuing recursively, we arrive 
at the non-terminal STATE. It has additional terminal sym- 
bols as children, stateid{ and ). We try to match these with 
the sentence and after being unsuccessful, we concatenate 
on the leafs of the current subtree to generate another poten- 
tial candidate, which yields stateid{' Virginia'). Continu- 
ing to traverse we arrive at the non-terminal CITY in the 
rule (2/). As in the previous case, it has terminal symbols 
loc2{ and ) as children, and we are unable to match them 
onto the sentence. Thus we again concatenate at the leaves, 
leading to loc2{stateid{' Virginia')) as a potential represen- 
tation candidate for the word "Virginia". Continuing up- 
wards in the tree, we reach the non-terminal symbol CITY 
given by the rule (1/). In this case, we can match one of it's 
children, the terminal city{, with some words in the sentence 
and we stop. This approach produces three possible repre- 
sentations for "Virginia", 'virginia', stateid{' Virginia'), 
loc2{stateid{' Virginia')). However, during the training 
process the first one does not yield any new semantic data 
using the inverse lambda operators, while the third one is too 
specific and can only be used in very few sentences. Con- 
sequently, their weights are very low and they are not used, 
leaving stateid{' Virginia') as the relevant representation. 

We will now define an algorithm to obtain the candi- 
date noun expressions from the training set, denoted by 
INITIAL^. For our experiments, maxlevel was set to 
2 and accuracy was set to 0.7. 

• Input: A set of training sentences witil their corresponding desired representations S — 
{_(Si, Li) : i = l...n} wtiere Si are sentences and Li are desired expressions. A CCG 
grammar G for sentences Si . A CFG granunar G' for representations Li . 

FN{t) - given a CCG parse tree t, returns all ttie nouns in t nMATCH(w) - returns a 
set of terminal symbols partially matching the string w with accuracy a. Returns a single non 
terminal if is a single word. The accuracy for nM ATC H {w) is given by the partial string 
matching, given as the percentage of similar parts in between the strings AICYK{X) - given 
a set of terminal and non terminal symbols, finds the non-terminal symbol which can yield all of 
them using a modified CYK algorithm. 

maxlevel AI - maximum number of levels allowed to traverse in the derivation trees 

• Output: An initial lexicon L'q. 

• Algorithm: 

• Step 1: (A-expression generation) 

• Fori=l...n. 



- Parse Si using the CCG granunar to obtain ti . 

- Parse Li using the CFG grammar to obtain Ti . 

- Saw = FN(ti). 

- For each rrjj G W: 

* SetX = nMATCH(w) 

* Repeat a maximum of AI times 

■ Set N = MCYK{X). 

■ Set T to be a subtree of Ti rooted at the N. 

■ For each leaf node n of T which is a match of some word w' of the sentence Si, if 
the path from ntoN contains a non-tenninal symbol, replace n with a new A bound 
variable v and add A^j. to F 

■ Concatenate all the leaf nodes of T to forin T' . 

■ Set r = r. r , where represents string concatenation 

■ Add (ujj- , r) to Lq. 

■ If N has two or more non-terminal children, break. 

■ If has a child which terminal symbol can be matched to any word of Si but Wj , 
break. 

■ Set N = MCYK(N). 
• return Lq. 

The algorithm stops when it encounters other terminals 
because we are looking for the representations of specific 
words. We assume each word is represented as a lambda cal- 
culus formula. Once we encounter a terminal corresponding 
to some other word of the sentence, we assume that word 
has it's own representation which we do not want to add to 
the representation of the current noun we are investigating. 
The algorithm produces results such as Ax, answer{x) for 
the words list, name, what and stateid(' Virginia') for the 
word Virginia. In case of CLANG corpus, some of the re- 
sults are \x.\y.{x){do y), Xx.Xy.de finer ' x' y for each of 
the words call, let, if . 

Combining the output of both algorithms yields an initial 
lexicon which can be used by the system. Some of the results 



Word 


Obtained representations 


list 


Xx, answer{x) 


Virgina 


stateid(' Virginia' ) 


what 


Ax, answer (x) 


Mississippi 


stateid(' mississippi' ), riverid{' mississippi' ) 


if 


Xx .Xy .(x)(do y) 
Xx.Xy.de finer x y 


let 


Xx .Xy .(x){do y) 
Xx.Xy.de finer x y 


player 5 


{player our {5}) 


mid field 


Xx.{x midfield) 



Table 5: Examples of learned initial representa- 
tions. 

Evaluation 

Similarly to ("Zettlemoyer and Collins 2009"), we used the 
standard GEOQUERY and CLANG corpora for evalua- 
tion.The GEOQUERY corpus contained 880 English sen- 
tences with their respective database queries in funql lan- 
guage. The CLANG corpus contained 300 entries specify- 
ing rules, conditions and definitions in CLANG. 

In all the experiments, we used the C&C parser of (jClark 
and Curran 2007| l to obtain syntactic parses for sentences. 
In case of CLANG, most compound nouns including num- 
bers were pre-processed. We used the standard 10 fold cross 
validation and proceeded as follows. A set of training and 
testing examples was generated from the respective corpus. 
These were parsed using the C&C parser to obtain the syn- 
tactic tree structure. Next, the syntactic parses plus the 
grammar derivations of the desired representations for the 



training data were used to create a corresponding initial dic- 
tionary. These together with the training sets containing the 
training sentences with their corresponding semantic repre- 
sentations (SRs) were used to train a new dictionary with 
corresponding parameters. Note that it is possible that many 
of the words were still missing their SRs, however note that 
our generalization approach was also applied when comput- 
ing the meanings of the test data. This dictionary was then 
used to parse the test sentences and the highest scoring parse 
was used to determine precision and recall. Since many 
words might have been missing their SRs, the system might 
not have returned a proper complete semantic parse. To mea- 
sure precision and recall, we adopted the measures given by 
|Wong an d Mooney 2007 ) and ( |Ge and M ooney 2009). Pre- 
cision denotes the percentage of of returned SRs that were 
correct, while Recall denotes the percentage of test examples 
with pre-specified SRs returned. F-measure is the standard 
harmonic mean of precision and recall. For database query- 
ing, a SR was correct if it retrieved the same answer as the 
standard query. For CLANG, an SR was correct if it was an 
exact match of the desired SR, except for argument ordering 
of conjunctions and other commutative predicates. 

To evaluate our system, a comparison with the perfor- 
mance results of several alternative systems with available 
data is given. In many cases, the performance data given 
by (Ge and Mooney 2 009| l are used. We compared our 
system with the following ones: The SYNO, SYN20 and 
GOLDSYN s ystems by (jGe and Mooney 2009] ), the system 
SCISSOR by (|Ge and Mooney 2005|) , an SVM based system 
KRIPS by ( Kate and Mooney 2006|l, a synchron ous gram- 
mar based system WASP by dWong and Mooney 2007 ), the 
CCG based system by (Zettlemoyer and Collins 2007 ), the 
work by ( Lu et al. 2008() and the INV ERSE and INVERSEh- 
systems given by ( |Baral et al. 201 l| l. The results for differ- 
ent copor a, if available, are given by the tables [6| and ^ ' The 
work by ( Percy, Michael, and Dan 201 1 )) reports a 9L1% re- 
call on geoquery corpus but uses a 600 to 280 spUt. 





Precision 


Recall 


F-measure 


A-INVERSE+ 


94.58 


90.22 


92.35 


1NVERSE+ 


93.41 


89.04 


91.17 


INVERSE 


91.12 


85.78 


88.37 


GOLDSYN 


91.94 


88.18 


90.02 


WASP 


91.95 


86.59 


89.19 


Z&C 


91.63 


86.07 


88.76 


SCISSOR 


95.50 


77.20 


85.38 


KRISP 


93.34 


71.70 


81.10 


Lu at al. 


89.30 


81.50 


85.20 



Table 6: Performance on GEOQUERY. 





Precision 


Recall 


F-irieasure 


A-INVERSE+ 


87.05 


79.28 


82.98 


INVERSE+(i) 


87.67 


79.08 


83.15 


INVERSE+ 


85.74 


76.63 


80.92 


GOLDSYN 


84.73 


74.00 


79.00 


SYN20 


85.37 


70.00 


76.92 


SYNO 


87.01 


67.00 


75.71 


WASP 


88.85 


61.93 


72.99 


KRISP 


85.20 


61.85 


71.67 


SCISSOR 


89.50 


73.70 


80.80 


Lii at al. 


82.50 


67.70 


74.40 



Table 7: Performance on CLANG. 

^ The INVERSE + (z) and A - INVERSE + (i) denotes 
evaluation where "(definec" and "(definer" at the start of SRs were 
treated as being equal. 



The results of our experiments indicate that our approach 
outperforms the existing parsers in F-measure and illustrate 
that our approach scales well and is applicable for sentences 
with various lengths. In particular, it is even capable of out- 
performing the manually created initial dictionaries given by 
(Baral et al. 201111. The main reason seems to be that unlike 



in ( jWong and Mooney 2007| l, our approach actually benefits 
from a more simplified nature of funql compared to PRO- 
LOG. The resulting A-calculus expressions are often sim- 
pler, as they do not have to account for variables and mul- 
tiple predicates. The increase in accuracy mainly resulted 
from the decrease of number of possible semantic expres- 
sions of words. As we understand the work by (Baral et al. 
|20I I| l would sometimes include many meanings of words. 
Our approach reduces this number. A decrease was caused 
by not being able to automatically generate some expres- 
sions that were manually added in Baral et al 20 IL The 
automatically obtained dictionary contained around 32% of 
the semantic data of the manually created one. 

Most of the failures of our system can be attributed to the 
lack of data in the training set. In particular, new syntactic 
categories, or semantic constructs rarely seen in the training 
set usually result in complete inability to parse those sen- 
tences. In addition, given the syntactic parses, a complex 
semantic representations in lambda calculus are produced, 
which are then often propagated via generalization and can 
produce bad translation and interfere with learning. Addi- 
tionally, many of the words will have several possible rep- 
resentations and the training set distribution might not prop- 
erly represent the desired one. The CSzC parser that we used 



was primarily trained on news paper text, ( Clark and Curran 
2007 ), and thus did have some problems with these differ- 
ent domains and in some cases resulted in complex semantic 
representations of words. This could be improved by using 
a different parser, or by simply adjusting some of the parse 
trees. 

In the previous paragraphs we compared our system with 
similar systems in terms of performance. We now give a 
qualitative comparison of our approach with other learning 
based approaches that can potentially translate natural lan- 
guage text to formal representation languages ( [Zettlemoyer 
and Collins 2005|l, (|Kate and Mooney 2006|l, (|Wong and 



Mooney 2006|), (|Won g and Mooney 200 7|l, (|Lu et al. 2008 
( [Zettlemoyer and Collin s 2007), (Ge and Mooney 2009 
(IKwiatkowski et al 
([Percy , Michael , and Dan 2011, ). ( ,Zettlemoyer and Collins 
[2005 | l uses a set of hand crafted rules to learn syntactic cat- 
egories and semantic representations of words using combi- 
natorial categorial grammar (CCG), ( Steedman 2000 ), and 
A-calculus formulas, (Gamut 19911. The same approach 



is adop ted in (|Zettlemoyer and Collins 2007 \. (Kanazawa 
[200 l| l, ( [Kanazawa 2003[ ) and ( [Kanazawa 2006[ ) focuses on 
computing the missing A-expressions, but do not provide a 
complete system. In ( Ge and Mooney 2009 ), a word align- 
ment approach is adopted to obtain the semantic lexicon and 
rules, which allow semantic composition, are learned. Com- 
pared to ( Ge and Mooney 2009[ ), we do not generate word 
alignments for the sentences and their semantic representa- 
tions. We only use a limited form of pattern matching to 



initialize our approach with several basic semantic represen- 
tations. We focus on the simplest cases, the top and bottom 
of the trees, rather than performing a complete analysis of 
the trees. We assign each word a A-calculus formula as it's 
semantics and use the native A-calculus application, @, to 
combine them rather than computed composition rules. The 
learning process then figures out which of the candidate se- 
mantics to use. We use a different syntactic parser which 
dictates the direction of the semantic composition. Both ap- 



proaches use a similar learning m odel based on (,Zettlemoyer 
[and Collins 2005] ) . The work by ( |Kwiatkowski et al. 2010) 1 
uses higher-order unification. Instead of using inverse, they 
perform a split operation which can break a A expression 
into two. However, this approach is not capable of learn- 
ing more complex A calculus formulas and lacks general- 
ization. ( Percy, Michael, and Dan 2011[ ) uses dependency- 
based compositional semantics(DCS) with lexical triggers 
which loosely correspond to our initial dictionaries. 

Conclusion and Discussion 

In this work we presented an approach to translate natural 
language sentences into semantic representations. Using a 
training set of sentences with their desired semantic repre- 
sentations our system is capable of learning the meaning 
representations of words. It uses the parse of desired se- 
mantic representations under an unambiguous grammar to 
obtain an initial dictionary, inverse A operators and gener- 
alization techniques to automatically compute the semantic 
representations based on the syntactic structure of the syn- 
tactic parse tree and known semantic representations without 
any human supervision. Statistical learning approaches are 
used to distinguish the various potential semantic represen- 
tations of words and prefer the most promising one. In this 
work, we are able to overcome some of the deficiencies of 
our initial work in ( [Baral et al. 201 1\ . Our approach here is 
fully automatic and it generates a set of potential candidate 
words for each noun based solely on the context free gram- 
mar of the target language and the training data. The result- 
ing method is capable of outperforming many of the existing 
systems on the standard copora of Geoquery and CLANG. 
There are many possible extensions to our work. One of 
the possible direction is to experiment with additional cor- 
pora which uses temporal logic as a target language. Other 
directions include the improvements in inverse lambda com- 
putation and application of other learning methods such as 
sparse learning. 
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